382 research outputs found

    Global Hierarchical Neural Networks using Hierarchical Softmax

    Full text link
    This paper presents a framework in which hierarchical softmax is used to create a global hierarchical classifier. The approach is applicable for any classification task where there is a natural hierarchy among classes. We show empirical results on four text classification datasets. In all datasets the hierarchical softmax improved on the regular softmax used in a flat classifier in terms of macro-F1 and macro-recall. In three out of four datasets hierarchical softmax achieved a higher micro-accuracy and macro-precision.Comment: Submitted to the 35th Symposium on Applied Computing (SAC'20, https://www.sigapp.org/sac/sac2020/), to the Machine Learning and its Applications track (MLA, https://sites.google.com/view/acmsac2020/

    Hermes: an Ontology-Based News Personalization Portal

    Get PDF
    Nowadays, news feeds provide Web users with access to an unlimited amount of news items, however only a subset of them is relevant. Therefore, users should be able to select the most relevant concepts, about which they want to retrieve news. Although keyword search engines provide users with the ability to filter news items, they lack the power of understanding the domain where the news items reside. The aim of this paper is to propose a solution that provides users with the ability to ask for news items related to specific concepts they are interested in. This is accomplished by creating an ontology, developing a classifying system that populates the ontology by making use of a knowledge base, and providing an innovative graph representation of the ontology to retrieve relevant news items. A characteristic feature of our approach is the consideration of both concepts and concept relationships for the retrieval of user-relevant items.semantic web; news classification; ontologies; OWL; SPARQL; decision support

    Implicit feature detection for sentiment analysis

    Get PDF
    Implicit feature detection is a promising research direction that has not seen much research yet. Based on previous work, where co-occurrences between notional words and ex- plicit features are used to find implicit features, this research critically reviews its underlying assumptions and proposes a revised algorithm, that directly uses the co-occurrences be- Tween implicit features and notional words. The revision is shown to perform better than the original method, but both methods are shown to fail in a more realistic scenario

    Determining the most representative image on a Web page

    Get PDF
    We investigate how to determine the most representative image on a Web page. This problem has not been thoroughly investigated and, up to today, only expert-based algorithms have been proposed in the literature. We attempt to improve the performance of known algorithms with the use of Support Vector Machines (SVM). Besides, our algorithm distinguishes itself from existing literature with the introduction of novel image features, including previously unused meta-data protocols. Also, we design and attempt a less-restrictive ranking methodology in the image preprocessing stage of our algorithm. We find that the application of the SVM framework with our improved classification methodology increases the F1 score from 27.2% to 38.5%, as compared to a state-of-the-art method. Introducing novel image features and applying backward feature selection, we find that the F1 score rises to 40.0%. Lastly, we use a class-weighted SVM in order to resolve the imbalance in number of representative images. This final modification improves the classification performance of our algorithm even further to 43.9%, outperforming our benchmark algorithms, including those of Facebook and Google. Suggested beneficiaries are the search engine community, image retrieval community, including the commercial sector due to superior performance

    APFA: Automated Product Feature Alignment for Duplicate Detection

    Get PDF
    To keep up with the growing interest of using Web shops for product comparison, we have developed a method that targets the problem of product duplicate detection. If duplicates can be discovered correctly and quickly, customers can compare products in an efficient manner. We build upon the state-of-the-art Multi-component Similarity Method (MSM) for product duplicate detection by developing an automated pre-processing phase that occurs before the similarities between products are calculated. Specifically, in this prior phase the features of products are aligned between Web shops, using metrics such as the data type, coverage, and diversity of each key, as well as the distribution and used measurement units of their corresponding values. With this information, the values of these keys can be more meaningfully and efficiently employed in the process of comparing products. Applying our method to a real-world dataset of 1629 TV's across 4 Web shops, we find that we increase the speed of the product similarity phase by roughly a factor 3 due to fewer meaningless comparisons, an improved brand analyzer, and a renewed title analyzer. Moreover, in terms of quality of duplicate detection, we significantly outperform MSM with an F 1-measure of 0.746 versus 0.525. </p

    Visualizing RDF(S)-based Information

    Get PDF

    Visualizing RDF(S)-based Information

    Get PDF

    A Dependency Graph Isomorphism for News Sentence Searching

    Get PDF
    Abstract. Given that the amount of news being published is only increasing, an effective search tool is invaluable to many Web-based companies. With word-based approaches ignoring much of the information in texts, we propose Destiny, a linguistic approach that leverages the syntactic information in sentences by representing sentences as graphs with disambiguated words as nodes and grammatical relations as edges. Destiny performs approximate sub-graph isomorphism on the query graph and the news sentence graphs, exploiting word synonymy as well as hypernymy. Employing a custom corpus of user-rated queries and sentences, the algorithm is evaluated using the normalized Discounted Cumulative Gain, Spearman&apos;s Rho, and Mean Average Precision and it is shown that Destiny performs significantly better than a TF-IDF baseline on the considered measures and corpus
    • ā€¦
    corecore